Skip to content

Conversation

@Suryakantdsa
Copy link

@Suryakantdsa Suryakantdsa commented Aug 2, 2025

Pull Request Template

Description:

  • Added GCS file provider in pkg/datasource/file/gcs .
  • Implemented Create , Remove , ReadDir , Open ,Stat and MakeDir using cloud.google.com/go/storage .
  • Simulates directories using object key prefixes .

Checklist:

  • I have formatted my code using goimport and golangci-lint.
  • All new code is covered by unit tests.
  • This PR does not decrease the overall code coverage.
  • I have reviewed the code comments and documentation for clarity.

Fixes #2013

@Suryakantdsa
Copy link
Author

hi @Umang01-hash , @coolwednesday ,

Please have a look at my PR , let me know if you notice anything that needs changes.

thanks !

@coolwednesday
Copy link
Member

hi @Umang01-hash , @coolwednesday ,

Please have a look at my PR , let me know if you notice anything that needs changes.

thanks !

Hey!

Can you add Setup details for the ease of testing and documentation for users who would want to setup the same by referring documentation (when this feature would be out).

@Suryakantdsa
Copy link
Author

Suryakantdsa commented Aug 4, 2025

sure @coolwednesday , I will add the setup guide and documentation to help the user to use this feature.

thanks for the feedback👍

@Suryakantdsa
Copy link
Author

hey @coolwednesday ,

I have noticed that the setup details for file handling (S3,FTP/SFTP etc .) are already there in gofr/docs/advanced-guide/handling-file/page.md .

Just wanted to confirm , should I add a setup instruction for GCS in the same file or would you prefer this to be in a separate file ?

thanks..!

@Suryakantdsa
Copy link
Author

Hey @coolwednesday ,

Could you please take a look at this above doubt ?

@Suryakantdsa
Copy link
Author

hey @ccoVeille @Umang01-hash ,
Could you please take a look at this above doubt ?

Copy link
Member

@coolwednesday coolwednesday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the unified handling-files documentation to include a code snippet demonstrating how to add GCS support.

Additionally, make the necessary changes in the Container package so that it works like any other datasource. For reference, review PR #2076 and replicate the relevant changes.

After that, ensure traces, logs, and metrics are integrated (also refer to #2076 for implementation details).

Once implemented:

  • Add screenshots showing logs, traces, and metrics from the running cloud store.
  • Add setup documentation in the docs folder.
  • Update the navigation.js.
  • Include any Docker container commands used in contributing.md.

@Suryakantdsa
Copy link
Author

Ok, got it. I’m working on it and will let you know once it’s completed.
@coolwednesday thanks..!

@Suryakantdsa Suryakantdsa force-pushed the feat/gcs-file-provider branch from 8c79d6a to 7b1824a Compare August 16, 2025 19:06
@Suryakantdsa
Copy link
Author

hey @Umang01-hash , @coolwednesday ,

  • I have Updated the existing unified Handling File documentation (handling-file.md) to include GCS setup and usage (no new navigation entry added).
  • Included Docker container commands in contributing.md.

Here are some screenshots showing logs, traces, and metrics from the running cloud store.

Screenshots (click to expand) image
Screenshots (click to expand) image
Screenshots (click to expand) image
Screenshots (click to expand) image

Please let me know if you notice anything that needs changes.
Thank you so much..!

Copy link
Member

@coolwednesday coolwednesday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole implementation other than this looks good to me.

Copy link
Contributor

@akshat-kumar-singhal akshat-kumar-singhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall rewrite of file package should be considered before adding more implementations like GCS etc

@Suryakantdsa
Copy link
Author

Hi @Umang01-hash,

My branch was not up to date with gofr/development. I’ve rebased it and pushed the updates now.
please have a look.

thank you..!

Copy link
Member

@Umang01-hash Umang01-hash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please remove all the go.mod changes from all the other datasources as they are not related to scope of your PR. Also the changes inside examples/using-add-filestore is also not required. Please revert all of these changes.
  • There are no tests written for entire newly added code but only some of it. Please make sure to write tests using mocks for the newly added code (feel free to refer other datasources to see how they use mocks for testing).
  • Only export errors, constants etc when needed, unexport all the ones not used outside the directory.

@Suryakantdsa
Copy link
Author

Hi @Umang01-hash,
I’ve noted the required changes you mentioned. I’m working on them and will update you once they’re completed.

thanks for your time..

@Suryakantdsa
Copy link
Author

Hi @Umang01-hash,

I’ve made the required changes. Please have a look when you get a chance.

thank you..!

Copy link
Member

@Umang01-hash Umang01-hash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Suryakantdsa, Thankyou for making the requested changes.

After running the tests here is your current code coverage:
Screenshot 2025-09-05 at 3 59 39 PM

Please make sure that each file (except from mocks, interfaces) has atleast 75-80% code coverage.

  • Also in the example you provided in the documentation, the gcs Config:
&gcs.Config{
		BucketName:      "my-bucket",
		CredentialsJSON: readFile("gcs-credentials.json"),
		ProjectID:       "my-project-id",
	}

Endpoint field is missing from here.

@Umang01-hash
Copy link
Member

@Suryakantdsa are you still working on this PR??

@Suryakantdsa
Copy link
Author

Hi @Umang01-hash,
I was unavailable earlier due to some medical urgency, but I’m back to working on this PR now

@Suryakantdsa
Copy link
Author

Hi @Umang01-hash,

I have addressed the required changes.
Please take a look when you get a chance.

thanks !

Umang01-hash
Umang01-hash previously approved these changes Oct 15, 2025
@Umang01-hash
Copy link
Member

@Suryakantdsa Thankyou for all the changes, the PR looks good now. Only the retry part is left for which i have created another issue since it can be handled in other PR.

@akshat-kumar-singhal
Copy link
Contributor

@Umang01-hash I see few areas of improvement:

  • WriteAt, ReadAt etc are implemented as "Not Supported" - why do we need to have them unless we're embedding the ReaderAt/WriterAt interfaces in our interface and hence need to mandatorily implement it.
  • The metrics we're exposing has location i.e. file path as label value - hope we've considered the cardinality aspect here.
  • Metrics being exposed across AWS/GCP (and other providers later) should be consistent - consider having a single implementation in the main file package and restrict only the file transfer implementation in the respective packages. A unified metric set with a label for storage type (AWS/GCP/FTP etc) would allow a better visibility to the application developer as well as limit the number of new metrics/panels that'd be added.
  • We could look at the entire file system implementation as file operations + storage/retrieval, where the file operations are majorly common and can be implemented in the file package. The retrieval/storage is what can be implemented in the respective packages by FTP/SFTP/S3/GCS etc. Those implementations can have the additional check of version key/locking before updating the file on remote to ensure consistent writes.
  • Separate FileLog for GCS could be avoided - it should be available in the file package itself.
  • Definition of standard constants like SUCCESS, ERROR looks off - I understand that "Little copying is better than little dependency", we could however have these in the file package, which we'd already be importing in the various implementations
  • Definition of histogram buckets within the implementation package should be avoided.

@akshat-kumar-singhal
Copy link
Contributor

@Umang01-hash I see few areas of improvement:

  • WriteAt, ReadAt etc are implemented as "Not Supported" - why do we need to have them unless we're embedding the ReaderAt/WriterAt interfaces in our interface and hence need to mandatorily implement it.
  • The metrics we're exposing has location i.e. file path as label value - hope we've considered the cardinality aspect here.
  • Metrics being exposed across AWS/GCP (and other providers later) should be consistent - consider having a single implementation in the main file package and restrict only the file transfer implementation in the respective packages. A unified metric set with a label for storage type (AWS/GCP/FTP etc) would allow a better visibility to the application developer as well as limit the number of new metrics/panels that'd be added.
  • We could look at the entire file system implementation as file operations + storage/retrieval, where the file operations are majorly common and can be implemented in the file package. The retrieval/storage is what can be implemented in the respective packages by FTP/SFTP/S3/GCS etc. Those implementations can have the additional check of version key/locking before updating the file on remote to ensure consistent writes.
  • Separate FileLog for GCS could be avoided - it should be available in the file package itself.
  • Definition of standard constants like SUCCESS, ERROR looks off - I understand that "Little copying is better than little dependency", we could however have these in the file package, which we'd already be importing in the various implementations
  • Definition of histogram buckets within the implementation package should be avoided.

cc @coolwednesday @gizmo-rt @Suryakantdsa

@Umang01-hash
Copy link
Member

@Umang01-hash I see few areas of improvement:

  • WriteAt, ReadAt etc are implemented as "Not Supported" - why do we need to have them unless we're embedding the ReaderAt/WriterAt interfaces in our interface and hence need to mandatorily implement it.
  • The metrics we're exposing has location i.e. file path as label value - hope we've considered the cardinality aspect here.
  • Metrics being exposed across AWS/GCP (and other providers later) should be consistent - consider having a single implementation in the main file package and restrict only the file transfer implementation in the respective packages. A unified metric set with a label for storage type (AWS/GCP/FTP etc) would allow a better visibility to the application developer as well as limit the number of new metrics/panels that'd be added.
  • We could look at the entire file system implementation as file operations + storage/retrieval, where the file operations are majorly common and can be implemented in the file package. The retrieval/storage is what can be implemented in the respective packages by FTP/SFTP/S3/GCS etc. Those implementations can have the additional check of version key/locking before updating the file on remote to ensure consistent writes.
  • Separate FileLog for GCS could be avoided - it should be available in the file package itself.
  • Definition of standard constants like SUCCESS, ERROR looks off - I understand that "Little copying is better than little dependency", we could however have these in the file package, which we'd already be importing in the various implementations
  • Definition of histogram buckets within the implementation package should be avoided.

@akshat-kumar-singhal Thanks for your valuable feedback and review.

Let's try and address these comments one by one:

1. WriteAt, ReadAt etc are implemented as "Not Supported" - why do we need to have them unless we're embedding the ReaderAt/WriterAt interfaces in our interface and hence need to mandatorily implement it.

  • These methods must be implemented because File struct must satisfy the file.File interface:
// File represents a file in the filesystem.
type File interface {
	io.Closer
	io.Reader
	io.ReaderAt
	io.Seeker
	io.Writer
	io.WriterAt

	ReadAll() (RowReader, error)
}

which embeds io.ReaderAt, io.Seeker, and io.WriterAt. By implementing them with error returns, we maintain interface compliance while clearly indicating these operations aren't supported by GCS.

2. The metrics we're exposing has location i.e. file path as label value - hope we've considered the cardinality aspect here.

  • Looking at the implementation i found file paths (Location) are only included in debug logs via f.logger.Debug(fl), not in metrics. The metrics histogram only uses operation type and status as labels ("type", fl.Operation, "status", clean(fl.Status)), keeping cardinality manageable.

3. Metrics being exposed across AWS/GCP (and other providers later) should be consistent - consider having a single implementation in the main file package and restrict only the file transfer implementation in the respective packages. A unified metric set with a label for storage type (AWS/GCP/FTP etc) would allow a better visibility to the application developer as well as limit the number of new metrics/panels that'd be added.

4. We could look at the entire file system implementation as file operations + storage/retrieval, where the file operations are majorly common and can be implemented in the file package. The retrieval/storage is what can be implemented in the respective packages by FTP/SFTP/S3/GCS etc. Those implementations can have the additional check of version key/locking before updating the file on remote to ensure consistent writes.

  • I agree that a unified metrics system and better separation between common operations and provider-specific code would be beneficial. However, this PR is specifically focused on adding GCS integration to match existing functionality. The TODO comment already acknowledges the need for refactoring both implementations to use a common interface.
 
// TODO: Future improvement - Refactor both S3 and GCS implementations to use a
// common CloudStorageClient interface for better abstraction. This implementation
// currently follows the same pattern as S3 to maintain consistency with existing code.
  • Following single-responsibility principles for PRs, I propose to implement these architectural improvements in a dedicated follow-up PR. This would allow proper focus on the refactoring without mixing it with the GCS integration work, making both changes easier to review and more reliable.

5. Separate FileLog for GCS could be avoided - it should be available in the file package itself.

  • I agree that duplicating the FileLog structure across providers creates unnecessary redundancy. While consolidating this into a common structure in the file package would be beneficial, however, this would require coordinated changes across all storage providers (S3, SFTP, and GCS). For better change management and to maintain the focused scope of this PR (adding GCS integration), I propose addressing this in a separate follow-up PR that would properly refactor logging across all providers.

6. Definition of standard constants like SUCCESS, ERROR looks off - I understand that "Little copying is better than little dependency", we could however have these in the file package, which we'd already be importing in the various implementations

  • I agree that standardizing constants like statusSuccess and statusErr in the main file package would improve consistency and reduce duplication across storage implementations. Similar to the other architectural suggestions, this would be best addressed as part of a broader refactoring effort affecting all file storage implementations.

7. Definition of histogram buckets within the implementation package should be avoided.

  • I agree that hardcoding histogram bucket definitions within implementation packages creates inconsistency. Moving these definitions to the main file package would ensure standardized metrics across all storage implementations and should be included in our planned refactoring PR.

I have created this issue #2433 where we mentioned all these changes you suggested to imrpvoe the file package implementation, once we get GCS merged then before adding any new file store our core focus will be to refactor the current implementation for all file stores.

@Umang01-hash
Copy link
Member

@Suryakantdsa @akshat-kumar-singhal An update here is I have made some changes to how metrics and logs are being handled by diff file systems :

  • Extracted logging and metrics functionality to the common file package.
  • Standardized constants like StatusSuccess and StatusError in the file package.
  • Moved histogram bucket definitions to the main file package for consistency

GCS implementation is now using these common components. You can check these in new PR : #2435.

I think now we can review and merge that and the next set of things to refactor in the file systems will be:

  • Make other file systems (S3, SFTP, FTP) to use these common logger and metrics component of file package.
  • Refactor the file system implementation to separate common operations from provider-specific storage/retrieval.
  • Create a unified interface that all providers (S3, GCS, FTP, etc.) can implement

The current changes have significantly improved code organization while maintaining compatibility with existing implementations. After merging this PR, we can focus on the deeper architectural refactoring in a separate PR to ensure proper separation of concerns and standardization across all storage providers.

@Umang01-hash
Copy link
Member

@Suryakantdsa One thing that can be resolved is:

These methods (Seek, ReadAt, WriteAt) should be fully implemented for the GCS provider rather than returning "not supported" errors.

func (*File) Seek(_ int64, _ int) (int64, error) {
	// Not supported: Seek requires reopening with range.
	return 0, errSeekNotSupported
}

func (*File) ReadAt(_ []byte, _ int64) (int, error) {
	return 0, errReadAtNotSupported
}

func (*File) WriteAt(_ []byte, _ int64) (int, error) {
	return 0, errWriteAtNotSupported
}

While GCS (like S3) doesn't natively support random access operations, we can implement these methods using similar patterns to the S3 implementation:

  1. For Seek(): Maintain an internal offset state and implement the standard seek behaviors (SEEK_SET, SEEK_CUR, SEEK_END)
  2. For ReadAt(): Download the necessary portion of the file using GCS range requests
  3. For WriteAt(): Download the entire file, modify the content at the offset, then re-upload the file

This would provide a consistent API across storage providers and allow users to use these methods regardless of the underlying storage system. Please follow the S3 implementation pattern with appropriate observability using the common file.OperationObservability helper.

Please proceed ahead with these changes here and i will take a update from here on the new PR #2435 .

@Suryakantdsa
Copy link
Author

ok i will proceed with them

thanks @Umang01-hash

@Suryakantdsa
Copy link
Author

Hi @Umang01-hash ,

Implemented Seek, ReadAt, and WriteAt methods for GCS provider as required changes.
Please take a look when you get a chance.

Thanks!

@Umang01-hash
Copy link
Member

@Suryakantdsa Thankyou for your contribution. Since we are now continuing the review and enhancements on the new PR #2435 I am closing this PR as of now. Will continue to target and resolve any review comments there and hope to merge this feature soon!

@Suryakantdsa
Copy link
Author

Thanks for the update! 😊 @Umang01-hash

Kinda sad to see this one closed without merging 😅, but I totally get it.
Excited to see things moving forward in #2435!
Really appreciate the feedback and the chance to contribute — can’t wait to see it merged soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Google Cloud Storage (GCS) integration as File System

4 participants